EAML: ensemble self-attention-based mutual learning network for document image classification

نویسندگان

چکیده

In the recent past, complex deep neural networks have received huge interest in various document understanding tasks such as image classification and retrieval. As many types a distinct visual style, learning only features with CNNs to classify images has encountered problem of low inter-class discrimination, high intra-class structural variations between its categories. parallel, text-level jointly learned corresponding properties within given considerably improved performance terms accuracy. this paper, we design self-attention-based fusion module that serves block our ensemble trainable network. It allows simultaneously learn discriminant text modalities throughout training stage. Besides, encourage mutual by transferring positive knowledge during This constraint is realized adding truncated Kullback–Leibler divergence loss (Tr- $$\hbox {KLD}_{{\mathrm{Reg}}}$$ ) new regularization term, conventional supervised setting. To best knowledge, first time leverage approach along perform classification. The experimental results illustrate effectiveness accuracy for single-modal multi-modal modalities. Thus, proposed model outperforms state-of-the-art based on benchmark RVL-CDIP Tobacco-3482 datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

Genre-based image classification using ensemble learning for online flyers

This paper presents an image classification model developed to classify images embedded in commercial real estate flyers. It is a component in a larger, multimodal system which uses texts as well as images in the flyers to automatically classify them by the property types. The role of the image classifier in the system is to provide the genres of the embedded images (map, schematic drawing, aer...

متن کامل

Tsallis Mutual Information for Document Classification

Mutual information is one of the mostly used measures for evaluating image similarity. In this paper, we investigate the application of three different Tsallis-based generalizations of mutual information to analyze the similarity between scanned documents. These three generalizations derive from the Kullback–Leibler distance, the difference between entropy and conditional entropy, and the Jense...

متن کامل

Ensemble LUT classification for degraded document enhancement

The fast evolution of scanning and computing technologies have led to the creation of large collections of scanned paper documents. Examples of such collections include historical collections, legal depositories, medical archives, and business archives. Moreover, in many situations such as legal litigation and security investigations scanned collections are being used to facilitate systematic e...

متن کامل

Self-Paced Learning for Semisupervised Image Classification

In this project, I plan to apply self-paced learning to the bounding-box problem using the VOC2011 dataset.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal on Document Analysis and Recognition

سال: 2021

ISSN: ['1433-2833', '1433-2825']

DOI: https://doi.org/10.1007/s10032-021-00378-0